[feature](be) Add adaptive batch size for scan path (#62835)#63005
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
Issue Number: None
Related PR: None
Problem Summary: Add adaptive block row prediction for SegmentIterator,
OLAP scan, file scan, and format readers. The scan path now uses a row
ceiling plus preferred output byte budget to reduce oversized blocks for
wide rows while preserving row-limited behavior for narrow rows. This
commit also introduces the shared session/config/thrift/runtime budget
plumbing used by later operators.
Adds adaptive batch size controls for scan output blocks:
preferred_block_size_bytes and preferred_max_column_in_block_size_bytes.
- Test: Unit Test
- Unit Test: ./run-be-ut.sh --run
--filter=BlockBudgetTest.*:RuntimeStateBatchSizeTest.*:RuntimeStateBlockSizeBytesTest.*:RuntimeStateMaxColBytesTest.*:MockRuntimeStateBlockBudgetTest.*:AdaptiveBlockSizePredictorTest.*:BlockReaderBatchMaxRowsTest.*:EstimateCollectedEnoughTest.*:CollectedEnoughWithColumnsTest.*:BlockReaderByteBudgetTest.*:SegmentColumnRawDataBytesTest.*:CsvReaderSetBatchSizeTest.*:NewJsonReaderSetBatchSizeTest.*:OrcReaderTest.*:TableFormatReaderTest.*:ProfileSpecTest.*:LocalExchangerTest.*
- Behavior changed: Yes (scan output block sizing can now be byte-budget
limited when adaptive batch size is enabled)
- Does this need documentation: Yes
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
None
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7b81191 to
af17679
Compare
Issue Number: None
Related PR: None
Problem Summary: Cluster-key MOW compaction sorts rows by cluster key, so duplicate unique keys may be non-adjacent and can remain visible in the output rowset. Scan the output rowset primary key index after compaction and add output-rowset internal delete bitmap entries for older duplicate unique-key rows.
None
- Test: Unit Test
- Ran ./run-be-ut.sh --run --filter=VerticalCompactionTest.ClusterKeyMowCompactionNeedsOutputRowsetInternalDedup -j 8
- Behavior changed: No
- Does this need documentation: No
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
skip buildall |
Pick PR: #62835
Problem Summary: Add adaptive block row prediction for SegmentIterator, OLAP scan, file scan, and format readers. The scan path now uses a row ceiling plus preferred output byte budget to reduce oversized blocks for wide rows while preserving row-limited behavior for narrow rows. This commit also introduces the shared session/config/thrift/runtime budget plumbing used by later operators.
Adds adaptive batch size controls for scan output blocks: preferred_block_size_bytes and preferred_max_column_in_block_size_bytes.
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
None
Test
This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason
Behavior changed:
Does this need documentation?
Yes.
Confirm the release note
Confirm test cases
Confirm document
Add branch pick label
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)